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ABSTRACT 


A  finite  automaton  is  abstractly  represented  as  a 
set  function  from  one  finite  set  into  another.   Many  of 
the  problems  posed  for  finite  automata  are  then  simply 
described  in  terms  of  set  functions  with  special  properties, 
Some  elementary  results  on  the  existence  and  uniqueness  of 
such  "discrimination  functions"  are  presented. 

As  a  related  example  a  finite  automaton  is  described 
which  can  recognize  a  large  variety  of  geometric  patterns 
(or  characters)  when  displayed  in  a  rather  general  way, 

A  finite  automaton  which  is  essentially  a  "perceptron" 
is  described.   In  order  that  such  a  device  represent  a 
discrimination  function  it  is  shown  that  a  specific 
product  of  two  set  functions  must  also  represent  a 
discrimination  function.   Some  rather  severe  necessary 
conditions  for  solving  the  basic  discrimination  problem 
are  then  derived. 

Some  final  comments  on  approximate  discrimination 
and  generalizations  of  the  basic  formulation  are  given. 
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FINITE  AUTOMATA, 
PATTERN  RECOGNITION  AND  PERCEPTRONS 

lo   Introduction. 

A  large  class  of  finite  automata  can  be  classified  as 
devices  which  exhibit  some  type  of  selective  responses  to 
parts  of  their  "environment" o   In  addition  many  automata 
which  are  of  current  interest  are  intended  to  have  definite 
similarities  with,  or  to  be  in  some  sense  analogous  to  human 
nervous  systems  (insofar  as  the  latter  are  understood).   The 
all  too  frequent  overemphasis  on  these  aspects  of  automata, 
with  the  subsequent  morass  of  psychological  and  physiological 
terminology  introduced,  conceals  the  nature  of  the  basic 
(mathematical)  problem  which  must  be  considered.   The  first 
part  of  this  paper  (section  2)  presents  a  formulation  of 
the  general  problem  posed  by  many  automata,  namely:  to  find 
a  specific  set  function  or  class  of  set  functionSa   The 
formulation  presented  can  be  easily  extended  to  include  a 
more  general  class  of  automata  (or  discrimination  problems) 
than  those  explicitly  considered^  one  such  extension  is 
discusRod  in  Section  7. 

The  problem  of  "recognizing"  geometric  patterns  by 
automata  is  considered  in  Section  3»   A  specific  device 
which  solves  such  problems  with  some  generality  is  described. 
This  example  serves  to  yield  insight  into  the  formalities 
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introduced  and  discussed  throughout  the  other  sections  of 
the  papers   The  principles  of  this  automaton  are  so  trans- 
parent that  an  anthropomorphic  description  of  it,  in  such 
terms  as  "concept  formation",  "cognitive  system",  etc,  is 
clearly  not  called  for.   However,  if  the  device  were  described 
only  by  its  function,  and  the  simple  trick  of  its  operation 
were  concealed,  it  would  certainly  qualify  as  an  automation 
with  similarities  to  certain  types  of  human  stimulus-response 
reactions. 

Sections  1^.  and  5  are  concerned  with  a  more  or  less 
specifically  defined  device  called  a  "perceptron".   An 
attempt  is  made  to  define  a  perception-like  automaton  as 
a  nerve-net  (in  the  sense  of  Kleene  [2]  and  von  Neumann 
[3])o   Although  there  are  some  difficulties  in  this  formula- 
tion, due  to  vagueness  and  contradictions  in  the  descriptions 
of  perceptrons  in  [i^-J,  it  seems  clear  that  the  proposed 
model  has  or  can  have  all  of  the  features  of  a  perceptron 
which  are  claimed  to  be  novel.   It  is  then  shown  that  such 
a  device  can  be  represented  at  any  instant  of  time  as  a  very 
special  type  of  set  function.   Some  questions  regarding  the 
possibility  of  solving  the  basic  discrimination  problem 
with  these  special  set  functions  are  then  considered. 

In  Section  6  a  concept  is  introduced  which  should  prove 
useful  in  treating  approximate  discrimination  problems. 

It  is  believed  that  much  of  the  material  in  this  paper 
can  be  applied  to  automata  constructed  along  the  more  con- 
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ventional  lines  of  "McCollough-Pitts  nerve  nets."   These 
applications  will  be  reported  on  in  the  future. 
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2o   Stlmuli^T  Responses  and  Discrimination  Functions. 

Procedures  for  the  dlgltlllzation  of  various  types  of 
information  are  so  well  known  that  we  merely  assert  here 
the  assumption  that  whatever  "knowledge"  an  automation  is 
to  have  of  its  external  environment  is  in  the  form  of  a 
finite  bounded  sequence  of  0' s   and   1» s   (iee,  a  binary 
integer) ,   Conceptually  the  device  may  be  thought  of  as 
having  a  finite  number  of  input  lines  (sensors  or  input 
neurons)  which  we  may  order  in  some  arbitrary  but  fixed 
manner.   The  input  binary  Integer  then  represents  stimu= 
lation  of  those  input  lines  which  correspond  to  a   1   and 
non-stimulation  of  the  others o 

Similarly  the  digital  control  of  servos  and  other 
"active"  control  mechanisms  is  sufficiently  developed  to 
enable  us  to  limit  the  response  of  an  automaton  to  finite 
binary  integers,,   The  output  lines  (response  units  or 
effectors)  are  assumed  to  be  finite  in  number  and  are 
stimulated  or  non-stimulated  In  accordance  with  the 
appearance  of  a   1   or  0  in  the  corresponding  position  in 
the  binary  Integer,   With  these  heuristic  notions  in  mind 
we  introduce 

Deflnitir.A  2olo   A  binary  vector^   x   ^  of  diminslon  n 
is  a  (column)  vector  of  n   components.   x.  y  1  =  1, 2,  .o»,  n 
each  of  which  is  either   0   or   1  „ 

Such  binary  vectors  (or  equlvalently  for  some  purposes, 
binary  integers)  are  the  basic  quantities  in  terms  of  which 


(7) 


the  Inputj  output  and  state  of  finite  automata  will  be 
described.   The  following  facts  concerning  binary  vectors 
are  elementary: 

(i)  The  set  of  all  n-dimensional  binary  vectors  contains 
2   elements, 

(ii)  The  number  of  non-zero  components  common  to  two 

n-dimensional  binary  vectors,   x  and  y  ,  is  given  by 

n 
their  (real)  inner  product   (x,y)  =  (y,x)  =  \   x.y.  . 

i=l 

Definition  2.2«   Let   j,k,   and   n   be  positive  integers  and 

S  \  /  n  \  /  s 

A    )      =      the   set   of  all     )  j   (    -dimensional  binary  vectors,     \  a 

R    )  (  k  )  /  r 

Definition   2. 3°      F(S,R)    =   the    set   of   all   single   valued   fionctionsj 

f(s)    =   r  , 

on  S   to  R  j  i.e.  with  domain  S   and  range  in  R  , 

The  set   S   is  to  be  considered  the  set  of  all  possible 
stimuli  to  an  automaton  with  n-input  lines.   The  set   R 
is  the  set  of  all  possible  responses  of  an  automaton  with 
k-output  lines.   Any  finite  automaton  with  n-input  lines 
and  k-output  lines  then  corresponds  to  some  function 
f  e  F(S, R)  I  i.e.   F   is  equivalent  to  the  set  of  all 
such  automata.   If  for  any  set   X   we  let  "^(X)   be  the 
number  of  elements  in  the  set  the  above  definitions  yield 

?[(F)   =   [^(R)]^^^^   =  2^-2n  ^         (2.0) 
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Definition  2el4.,   Let   S^^,  S-,  ,,o,  S   be  m  >  1 
disjoint  subsets   of  S  ,  and 

2 

Let   r,  3  r^n  <,ooo  i*   be  m  distinct  elements   of  R  ,  and 
— —  ->^x  "^d.  *^  m  — -     —— 

^m  -  {^\»   Zz»     "•''  ^raj 

A)  A  function  f  e  F(Sj,R)   is  a  discrimination  function 
of  I   with  respect  to   0   if; 

f{s)  =  r   for  all  s  e  S   ,  u  =  1,2,  ...j,  mo 

B)  A  function  f  e  F(Sj,R)   is  a  strong  discrimination 

function  of  I   WoToto  0   if  it  is  a  discrimination 
m  — — —   m  


function  of  I   Woroto  0   and  in  addition: 
■ — • — — —   m  —   m  


m. 


C)  Ft^(I  oO  )  =   the  set  of  all  discrimination  functions  of 

u     m     m     —     ■ — -■ 

D)  F    .       1^    (I    oO    )    =      the    set   of   all    strong;   discrimination 
'        stroD       m*    ra''    =-     — ^ 

fxHictions    of     I     Woroto    0      o 

Clearly  a  large  class,  if  not  allj  of  the  desired 
overall  properties  of  a  finite  automaton  can  be  formulated 
in  terms  of  the  above  notions  of  discrimination  fiinctionso 
Thus  if  a  device  is  to  respond  in  one  way  to  one  class  of 
inputs  and  in  a  different  way  to  a  second  class  of  Inputs, 
etCo,  it  must  correspond  precisely  to  a  discrimination 
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functiono   It  Is  perhaps  not  "natural",  if  we  are  trying 
to  Imitate  hxoitian  behaviour  to  insist  upon  strong  dlscrimi- 
nationo   This  would  imply  that  certain  responses  can  be 
caused  only  by  certain  known  stimuli  (ioOo  all  hallucinations 
and  illusions  would  have  to  be  anticipated) »   However  strong 
discrimination  functions  seem  to  play  an  important  role  in 
constructive  existence  proofs.   (They  are  in  fact  the  kinds 
of  set  functions  implied  by  Kleene  [2j  and  von  Neumann  [3J 
where,  however,   m  is  implicitly  taken  to  be  2  and  k  =  1, 
It  would  also  seem  that  the  class  of  problems  more  vaguely 
formulated  by  F,  Rosenblatt  [l\.]    are  clarified  by  these 
notions  and  again  m  =  1   or  2   in  most  of  his  explicit 
discussions  « ) 

The  existence  and  uniqueness  properties  of  discrimi- 
nation functions  (and  hence  the  possible  existence  and 
uniqueness  of  the  implied  class  of  automatons)  are 
contained  in  the  following,  essentially  trivial,  results „ 
It  is  assiimed  throughout  this  discussion  that  a  definite 
discrimination  problem,  characterized  by  I   and  0   , 
is  posedo 

Tro  2olo   A  discrimination  function  of   I  WoToto   0 
.    . ___ Uj  HI 

exist..,  if  and  only  if  ni  <  2   o 

Proof,   From  the  definitions  2o2  and  2oi4-,  since   0  C  R  , 
—  '  m  — 

=  ?|(0j^  <7^(R)  =  2^  o  (2ol) 


m 


Thus  the  necessity  is  establishedo   The  sufficiency  is 
obvious  as  the  definition  2ok-   then  becomes  essentially 
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constructive,-^ 

Tr,  2«2  The  following  three  statements  are  equivalent: 

(a)  A  discrimination  function  of   I  Wer<,t<,   0   is 

m  m  — 

uni  que  « 

(b)  U  =  So 

m 

(^)   ^D^W  =^str.D  (VV  • 
Proof o   The  equivalence  between  (a)  and  (b)  follows  from 

"counting"  the  possible  number  of  discrimination  functions. 

Since  for  all  discrimination  functions   f  e  F-p,(I  jO  )  , 

D     m*  m   * 

f(U^)  =  0^  , 
ra     m  ' 

in  an  obvious  notation,  they  can  differ  only  in  mapping 

(S-U  )   into  R  o   The  number  of  ways  in  which  this  can  be 
m 

done,  and  hence  the  number  of  possible  discrimination 
functions,  is 

7J  (F^)    =  [^(R)]'^^^-V  ,  (2.2) 

Similarly  to  show  the  equivalence  between  (b)  and  (c) 
we  count  the  strong  discrimination  functions, 

^(^str.D'=t^"'-V  J ''"-""■'  .     '2-3) 

and  the  result  follows  on  equating  (2,2)  and  (2o3)»   (Note 
that  the  above  proof  relies  on  the  condition  m  >  1 
which  was  imposed  in  Defn,  2,14-,   Kleene  does  not  require 
the  equivalent  of  this  condition  [2]  and  is  then  forced  to 
consider  the  ensuing  special  trivial  cases  which  correspond 
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to  automata  with  no  Input  or  else  no  output.)   The  proof 
is  now  complete. 

The  above  equation  (2.3)  and  Tr.  2.2  clearly  imply 
the  following  result  which  has  direct  significance  to  the 
analysis  in  [2]  and  [3]  (since,  as  has  been  mentioned, 
they  take   m  =  2  and  k  =  1  which  implies  by  (2.1)  that 

R  =  0  )  . 

m 

Tr.  2.3.   If  R  =  0_  a. strong  discrimination  function  of 

ss.   —       rn  ^ ' — — 

1  w.r.t.  0   exists  if  and  only  if  S  =  U   ,  and  it  is 

m   m  ^ m  '  

then  unique. 

This  result  makes  clear  the  possible  difficulties  and 
care  which  must  be  taken  in  discussing  general  discrimination 
problems  while  assuming   k  =  1   (i.e.  only  one  output  line 
or  effector).   In  fact  if  strong  discrimination  of  only  one 
set   S  S  is  desired  the  problem  is  identical  to  that  of 
finding  any  discrimination  function  of   Ip  =  |S  ,S-S j  w.r,t, 

02  =  R.   Of  course  by  Tr,  2,3  there  is  a  uni que  such  function. 
Hence,  speaking  very  loosely,  it  would  seem  unlikely,  in 
such  cases,  that  an  automaton  constructed  "mainly"  by  random 
processes  could  yield  the  desired  discrimination  function. 

Finally  to  obtain  some  notion  of  the  relative  "density" 

of   F-   in   F  we  have 
ij 

Tr,  2,14.,   The  probability  that  a  function   f   chosen  at 

random  from  F   be  a  discrimination  function  of   I  w.r.t.  0 
ni  m 

is 
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-k?>(U  ) 
■    ■.     p(f  e  Fjj)  =  2   '   "^   .  (2.k) 

Proof.   The  probability  in  question  Is  just  the  ratio 
-T^  (F  )/'?^(F)  .   Thus  from  (2.0),  (2.2)  and  the  fact  that 
the  S   are  disjoint: 

7,     '^(^-V     r^    ,'>1(S)-^(U^) 
p(,  e  F^)  =  iiURIiJ ^  =  i3_(Rn ^^ L 

[?^(R)]^(^^         [?^(R)]^(^) 

.97(u^)    -k7^(u  ) 
=  [^  (R)]    ""  =  2     ""  . 

This  result  exhibits  the  obvious  fact  that  reducing 

k  increases  the  probability  of  selecting  a  discriinlnation 

function  at  random,  or  equlvalently,  that  it  increases  the 

relative  density  of  them  in  F  ,   Of  course  in  any  interest- 

Ine;  case  T?  (U  )  »  1   so  a  reduction  in   k  may  be  of 
°        *-   m 

no  great  consequence  in  practical  considerations. 
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3.   A  Pattern  Recognizing  Automaton, 

We  consider  here  a  simple  example  which,  while  of 
interest  in  itself,  may  also  aid  in  conceptually  understanding 
some  of  the  notions  previously  introduced.   The  example  is 
concerned  with  the  recognition,  by  some  type  of  device,  of 
a  variety  of  geometric  patterns  (ioe.  printed  characters, 
etc.)   Problems  related  to  the  study  of  such  devices  are 
frequently  considered  to  belong  to  the  field  of  finite 
automata o 

One  of  the  main  features  in  this  example  is  a  clear 
description  of  how  seemingly  complicated  (visual)  In- 
formation can  be  correlated  with  well  defined  subsets   S   « 
In  general  the  question  of  how  these  subsets  differ,  or 
rather  what  all  the  elements   s  e  S   of  any  particular 
subset  have  in  common,  is  not  directly  related  to  the 
theoretical  existence  problems  previously  discussed.   How- 
ever, in  any  practical  discrimination  problem  this  question 
is  really  basic,   A  considerable  part  of  the  discussion 
in  [!(-]  seems  t  o  be  concerned  with  just  such  matters. 

We  assume  that  the  patterns  to  be  recognized  are 
displayed  in  the  vmit  square,  f^  —  ^—  "^♦^  —  y—  "^^   » 
of  thf^      x-y  plane.   On  this  square  we  place  a  uniform 
grid  X  =  ah  ,  y„  =  Bh  of  mesh  size  h  =  —  and  so  the 

Integers   a,  (3   take  on  the  values   0,1,  ,.,,  p.   Thus 

2 
the  unit  square  is  partitioned  into   p    elementary  squares 

of  side  ho   We  now  impose  some  very  severe  restrictions 


(II;) 


on  the  patterns  to  be  recognized  and  on  how  they  are  to  be 
displayed.   Later  we  discuss  the  relaxation  of  these  con- 
ditions, 

G.)      Each  pattern  is  composed  of  elementary  squares, 

G  )   Distinct  patterns  are  composed  of  different 
numbers  of  elementary  squares, 

G-)   When  a  pattern  is  displayed^  its  boundary  must 
coincide  with  segments  of  any  of  the  grid  lines 
X  =  x^  ,  y  =  yp  . 

2 
These  restrictions  imply  that  at  most  p   such 

patterns  can  be  defined. 

Thinking  of  gadgetry  for  the  moment  a  pattern  may  be 

displayed  in  any  admissible  position  by  illuminating  the 

appropriate  elementary  squares.   An  automaton  is  imagined 

2 
which  has   p   input  lines,  one  from  each  of  the  elementary 

squares.   Then  by  any  of  a  variety  of  well-known  scanning 

techniques  a  unit  signal  can  be  made  to  appear  on  those 

lines  initiating  from  lllviminated  squares  and  no  signal 

will  be  present  on  the  others,   (In  actual  practice  of 

course  a  negative  unit  signal  is  usually  used  to  indicate 

no  input.   However,  we  do  not  wish  to  go  into  the  details 

of  these  technicalities  and  so  will  continue  to  use  loose 

terminology,  as  above,  in  describing  hardware,)   The 

class  of  possible  input  signals  is  thus  equivalent  to  the 

2 
set   3   of  n  =  p   dimensional  binary  vectors. 
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Let  the  distinct  patterns  to  be  recognized  be  denoted 
by  the  symbols   c,,  Cp,  ,.,,  c   and  let  the  integer  N 

be  the  number  of  elementary  squares  required  to  construct 
c  ,    \i  =   1,2,    ,.,,    m   ,      Then  by  Gp)  we  must  have 

N  =  N   i.a.o.i.   ti  =  V  .  (3.0) 

If  any  pattern   c    is  displayed  in  a  definite  position 
on  the  unit  square  in  accordance  with  G-,),  then  a 
corresponding  unique  binary  vector  s  e  S   can  be  defined 
which  represents  the  resulting  input  to  the  automaton. 
The  set  of  all  such  vectors  which  can  be  obtained  from 
all  admissible  positions  of   c    is  denoted  by  S   , 
This  is  to  be  done  for  all  tj,  =  1, 2,  ,.,,  m.   Thus  any 
admissible  display  of  c    is  represented  by  one  and  only 
one  vector  in  S   and  any  admissible  display  of  any  of 
the  patterns   c,  ,  Cp,  ,,,,  c    is  represented  by  one  and 
only  one  vector  in 

U  =  S,USp  ...  US^  . 
m    1  c:       m 

Furthermore  by  the  property  noted  earlier  of  scalar 
products  of  binary  vectors  we  have: 


For  all  ^  e  S^  ,  {s,b)    =  N^  ;  tx  =  1,2,  ...,  m  .    (3.I) 

This  result  and  (3.0)  imply  that  for  any  s   and 

s'   in  U„  : 
rs>        m 
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(sjs)  =  (s  ,s  )   i.a.o.i.   s  e  S   ,  s   e  S     (3»2) 

for  some   ^   (loe,  they  must  be  images  of  the  same  pattern). 
Clearly  then  is  this  example  the  feature  in  common  to  all 
s  e  S    is  their  value  of   (s.s)  »   Of  course  it  is 
elementary  that  the  area  of  any  geometric  pattern  is  in- 
varient  under  all  translations  and  rotations.   We  have 
merely  required,  in  G  ),  that  the  patterns  being  considered 

have  unequal  area,   (In  the  above  notation  the  area  covered 

2 
by  c    is  simply  h  N   ©)   These  considerations  of  area 
'        V-  V- 

form  the  basis  for  generalizing  the  present  example  to 
much  more  complicated  cases. 

Returning  to  the  proposed  automaton  we  let  all  the 
input  lines  go  to  some  device  which  adds  the  binary 
signals  on  them  and  represents  the  sum  as  a  binary  number, 
(Such  an  adder  is  simply  constructed  and  would  require 
2  logpp   stages  for  the  fastest  series-parallel  operation. 
The  adders  in  the  k-th  stage  would  have  to  add  k-blt 
binary  integers  in  parallel  and  there  would  have  to  be 

at  most  2r   p    of  them.   Thus  the  maximxim  number  of 

2 
adders  required  is   p  -1  ,  for  the  fastest  operation,) 

2 
The  largest  possible  sum  is   p   and  so  the  output  signal 

requires  at  most   2  logpP   binary  bits  or  output  lines. 

If  we  let   k  be  the  smallest  Integer   >  2  log„p  ,  then 

any  output  can  be  represented  by  a  k-dimensional  binary 

vector.   The  set   R   of  all  such  vectors,  ^  ,  is  the 

class  of  all  possible  responses  of  the  automaton  in 
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questioHo   Let  the  vector  which  is  the  binary  representation 
of  the  integer  N   be  denoted  by   r  ,  |j.  =  1,2,  ,..,  in  , 
These  vectors  are  clearly  unique. 
With  the  definitions 


0  = 
m 


we  can  now  interpret  the  discrimination  problems  of  I 

^  m 

with  respect  to   0  %    they  are  concerned  with  recognizing 
m  patterns,   c   ;,  in  various  positions  in  the  unit  square. 
For  this  problem  the  automaton  whose  construction  has  been 
indicated  above  is  a  representation  of  some   f  e  J^i-jdj^^  '^^J 
Thus  the  ordinary  discrimination  problem  is  solved  and  in- 
deed the  proposed  device  should  be  of  practical  slgnlfi- 

canceo 

However,  strong  discrimination  Is  not  possible  with 

this  automaton.   Clearly  some  pattern,   c  ^  c    of  elementary 

2 

squares  can  have  the  area  h  N   and  the  corresponding 

sle-nal   s   is  then  in  S   ,   So  some  patterns  which  are 
^     ^  (J, 

not   c   will  be  Identified  as   c   ,   Whether  this  situation 
is  tolerable  or  not  depends  upon  the  Intended  use  of  the 
device  and  the  total  class  of  patterns  to  which  It  will 
be  expcsedo 

We  turn  now  to  a  consideration  of  the  recognition  of 
more  general  patterns  than  those  of  G, )  with  greater  free- 
dom of  display  than  in  G  )«   However,  the  essence  of  G^) 
will  be  retained  in  a  somewhat  altered  form.   Again  the 
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patterns  are  denoted  by  c    and  they  are  to  be  displayed 
In  the  unit  square.   We  denote  the  area  of   c    by  A(c  )  , 
p.  =  1,2,  ..,,  m.   The  shape  of  the   c    can  be  quite 
general  (say  with  plecewlse  smooth  boundaries)  but  we  will 
not  go  into  any  analytical  details;  the  requirement  that 
A(c  )   is  well  defined  can  be  considered  the  condition 
G,)  which  replaces  G, ) ,   Condition  &_)  is  replaced  bys 

G^)   |A(c  )-A(c^)1  >  5  >  0  ,  [I  ^   V    ,    ^,v  =  1,2,  ...,  m  , 

In  other  words  the  areas  of  the   c   must  differ  by  at 

least   6  ,  some  fixed  positive  number. 

There  are  no  restrictions  on  how  or  where  the   c 

can  be  displayed  in  the  unit  square  (and  as  in  footnote  i| 

they  nay  intersect  the  boundaries),,   However,  we  must  now 

require  the  mesh  size   h   to  be  sufficiently  small.   To 

specify  this  precisely  we  should  know  the  exact  conditions 

of  illumination  under  which  a  sensed  elementary  square  will 

emit  a  signal  (i.eo  if  half  the  area  is  illuminated,  etc.). 

In  any  event  let   N  (h;x,y, 0)   be  the  number  of  elementary 

squares  of  side   h  that  emit  a  signal  when  the  image  of 

c   has,  say,  its  centroid  at  x,  y  and  some  fixed  axis 
It"- 

at  an  an^le   0  with  the  positive  x-axls.   Then  we  require 
h  to  be  such  that 


0  <  ^5y»  1  1^ 


e 


G^)   A(c^)»|  <  N^(h|X,y,e)  <  A(c^)  +-|,   )0<e<27C   ; 
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for  some  fixed   e   In  0<e<5o   That  Is,  we  require  the 
"sensitized  area"  of  any  image  of  any   c    to  be  "close" 
to  the  exact  areao   Close  here  means  only  less  than  half 
the  difference  between  the  two  closest  areas   A(c  )  » 
Now  the  input  classes  are  defined  such  thatj 

s  e  S   ioaoO.io  A(c  )-l  <  (s,s)  <  A(c  )  +  -I  j     (3o3) 

|x  •"  J- ^  ^p  o  «  o ,  m  % 

From  conditions  G  )  and  G  )  it  follows  that  these  sets 
S   are  dislointo 

The  output  or  response  classes  of  the  automaton  are 
now  defined  by  means  of  the  generalization  described  in 
footnote  2o      If  for  any  binary  vector  x  we  let   N(x) 
be  the  integer  whose  binary  representation  is  given  by  x 
then  the  response  classes,   R   ,  are  defined  by: 

r  e  R   i.a.o.i,  A{c  ) -|  <  N(r)  <  A(c  )    +  ^  }       (3<.i|) 


With  the  above  definitions  of   S   and  R  ,  and  h 
taken  to  satisfy  G  ),  the  automaton  previously  described 
solves  the  ordinary  discrimination  problem  for  the  very 
general  patterns  now  allowed.   Of  course  as  before  some 
spurious  inputs  may  be  recognized  as  patterns o   But  it 
should  also  be  noted  that  now  inexact  representation  of 
the  patterns  or  even  malfunctioning  of  a  few  of  the  inputs 
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from  elementary  squares  need  not  destroy  proper  recognition. 
The  "amount"  of  error  that  can  be  tolerated  is  determined 
by   e   and  h  for  any  given   c   « 

The  practicality  of  such  a  general  pattern  recognizing 
automaton  must  depend  upon  the  value  of  h  required.   Of 
course  if  the  patterns  in  question  have  complicated  shapes 
then  small  values  of  h  are  necessary  for  good  resolution 
of  the  areas  of  the  images.   Similarly  if  two  patterns 
have  nearly  equal  area  (i<,e,  small  5)  then   e   must  be  small 
andj  regardless  of  the  complexity  of  their  shapes,   h 

must  again  be  small  to  satisfy  G  )  for  these  patterns.   The 

2    1 
number  of  elementary  squares  required  is   p  =  — ^  and  the 

h"^ 

number  of  adders  required  has  been  shown  to  be  at  most 

2 
p  -1  ,   However^,  these  adders  are  of  unequal  complexity 

but  it  is  easily  shown  that  they  can  all  be  composed  of 

2   2 

p  (p  -1)   basic  units.   Thus  the  total  basic  hardware 

h* 
estimates  lead  to  large  numbers  they  indicate  that  meshes 


required  is  of  the  order  of  p  =  — r-  units.   While  these 


-2 
of  the  order  of  h  =  10    could  be  realized.   If  slower 

speeds  are  allowed^  which  seems  quite  reasonable,  the 

hardware  can  be  greatly  reduced  by  the  use  of  simple 

counters  and  appropriate  time  delays. 
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[|.o   "Perceptron"°-llke  Automata. 

The  discussion  of  Section  2  is  so  general  that  (with 
the  inclusion  of  time  delays,  which  are  considered  briefly 
in  Section  7)  it  applies  to  most  finite  automata.   In 
the  sense  of  that  discussion  two  automata  are  completely 
equivalent  if  they  correspond  to  the  same  set  function 
f  e  F  o   However  there  remain  a  number  of  important 
questions: 

(1)  Can  an  automaton  be  constructed  according  to  some 
definite  rules  and  represent  any  discrimination  function 
f  e  F_   of  a  given  discrimination  problem? 

(2)  Can,  instead,  an  automaton  be  constructed  which  will 
approximate  sufficiently  closely  any   f  e  F_  ,  in  some 
appropriate  norm? 

In  the  fundamental  papers  of  Kleene  [2]  and  von 
Neumann  [3]  it  is  shown  that  these  questions  can  be 
answered  in  the  affirmative «   More  particularly,  for  a 
specific  class  of  discrimination  problems,  Kleene  charac- 
terizes all  those   Fg  +  j,  n  for  which  an  equivalent  auto- 
maton of  a  specified  construction  exists,   von  Neumann 

shows  the  existence  for  some   F  ,   ^^   of  automata  con- 

str»D 

structed  in  a  slightly  different  manner.   His  main  concemp 
however,  is  with  a  thorough  analysis  of  the  second  question 
using  a  "probabilistic"  norm  (while  the  basic  units  of 
which  the  automaton  is  constructed  are  not  assumed  to  function 
perfectly) I   Also  in  the  previous  section  a  particular 
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automaton  is  described  which  represents  a  variety  of 
specific  discrimination  functions «   In  the  light  of  these 
results  it  would  seem  advisable  for  any  proposed  automaton 
to  first  study  the  existence  or  approximation  problems, 

A  specific  class  of  automata  is  defined  by  specifying 
the  basic  elements  of  which  it  is  to  be  composed  together 
with  rules  for  their  combination  or  connection.   This  is 
done  in  complete  detail  for  a  variety  of  such  automata 
In  [1,2,3]  and  more  vaguely  in  Section  2o   The  basic  ele- 
ments are  usually  called  "neurons"  and  a  collection  of 
them  formed  into  an  automaton  by  the  prescribed  rules  of 
connection  form  a  "nerve  net".   By  selecting  a  particular 
adder  such  a  description  is  easily  given  for  the  pattern 
recognizing  automaton.   We  shall  try  to  develope  such  a 
formulation  for  an  even  more  vaguely  proposed  automaton 
referred  to  as  a  "perceptron"  in  [I4-].   The  "nerve  net" 
to  be  introduced  may  not  have  all  of  the  properties 
mentioned  in  [14.]  but  it  is  believed  that  most  of  the  ex- 
cluded properties  are  more  restrictive.   Hence  the  proposed 
model  should  Include  as  special  cases   various  types  of 
"perceptrons". 

The  S-xxnits;   In  [I|.]  basic  units  are  introduced  which 
essentially  describe  the  binary  nature  of  the  input  signals 
to  an  automaton.   The  only  concern  with  such  units  need  be 
in  discussions  of  the  digl talizatlon  of  various  kinds  of 
"information".   Since  we  assume  such  techniques  known 
these  input  units  are  really  superfluous,   Howeverj,  if  it 
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aids  conceptually  one  may  think  of  an  S-\init  as  having 
only  two  possible  states:   stimulated  and  non-stimulated 
(as  an  example  we  may  think  of  the  elementary  squares). 
Prom  each  such  unit  eminates  one  output  line  (wire  or  nerve 
f iber) o   If  an  S-unit  is  stimulated  at  time   t  ^  a  unit 
signal  is  instantaneously  transmitted  on  its  output  line» 

If  there  are   n   such  S-units  in  a  given  automaton 
they  can  be  ordered  in  some  arbitrary  but  fixed  manner. 
Then  the  "state"  of  all  the  S-units  at  any  instant  can  be 
represented  by  some  n-dimensional  binary  vector,   s  e  S  o 
(The  symbol   S   always  represents  the  set  in  Defn,  2o2o 
The  combination  "S-unit"  has  the  meaning  implied  above 
and  should  cause  no  confusion.)   Thus   S   is  the  set  of 
all  possible  states  of  the  S-units  of  the  automaton  in 
question. 

The  Generalized  A-units;   This  unit  is  a  modification  of 
one  of  the  special  "neuron"  models  used  in  [1]  and  [3]o 
A  schematic  diagram  of  the  A.-unlt  is  presented  in  Figure  1„ 
It  has  one  output  line  and  some  positive  finite  number  of 
input  lines o   The  input  lines  are  attached  to  the  A-unit 
by  one  of  three  types  of  connections'^:   "e"  or  excitatory^ 
"1",  ^r  inhibitory!   "c"^  or  value  changing.   Each  of  the 
input  lines  can  be  either  stimulated  or  non-stimulated  and 
in  the  former  state  they  instantaneously  transmit  a  unit 
signal  to  the  A-unit  through  the  appropriate  type  of 
connection.   The  A-unit  itself  is  stimulated  if 
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n     -   n.    >  Q   ,  (l+.O) 

e     1  —   -^ 

and  non-stimulated  otherwise »   Here   n^   and  n.   are  the 
number  of  stimulated   "e*'   and   "i"   input  lines,  respec- 
tively, and   0   is  a  fixed  positive  constant  called  the 
threshold,   (We  note  here  immediately  that  non-integral 
values  of   0   are  superfluous  since  (l^oO)  Implies  that 
the  state  of  an  A-unit  is  a  piecewise  constant  function 
of   ©  «)   If  the  A-unit  is  stimulated  at  time   t   it 
transmits  a  signal,  at  time   t.+6^  ,  on  the  output  line. 
However,  this  signal  need  not  be  a  unit  signal  but  haa 
associated  with  it  a  "value"/   v(t+5^)  ,  say  amplitude  of 
the  signals  which  is  a  function  of  v{t)   and  n^(t)  , 
the  number  of   "c"   input  lines  stimulated  at  time   te 
The  time  lag,   5.  j  is  a  fixed  non-negative  quantity 
(see  further  discussion) o 

If  there  are   J   such  A-units  in  a  given  automaton 
they  can  be  ordered  in  some  arbitrary  but  fixed  mannero 
Then  the  "state  with  regard  to  stimulation"  of  all  the 
A-units  at  any  instant,   t  ,  can  be  represented  by  some 
j-dimensional  binary  vector,  say  ^(t)  e  A   (1  corresponding 
to  stimulation  and   0   otherwise).   Thus  the  set  A   of 
Defn.  2,2  represents  the  set  of  all  possible  states  of 
stimulation  of  the  A-units  at  any  instant. 

Let  the  diagonal  square  matrix  of  order   j. 
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contain  as  (3-th  diagonal  entry  the  value,   v„(t)  ,  of 
the  p-th  A-unit  (according  to  the  above  implied  ordering) 
at  time   t  .   Then  the  J-dlmenslonal  vector 

V(t+5^)  a(t)  {k.2) 

represents  the  state,  with  regard  to  value,  of  the  output 
lines  of  all  the  A-unlts  at  time   t+5.  o   (The  form  of 
functional  dependence  of  v(t+5. )   on  v(t)   and  n  (t) 
will  be  shown  later  to  be  superfluous  for  our  purposeo) 
The  R-units:   A  schematic  diagram  of  an  R-unit  is  presented 
in  Figure  Ze      It  has  one  output  line  and  any  positive  finite 
number  of  input  lines.   The  input  lines  are  connected  to 
one  of  two  types  of  connections,   "e"   or   "1"  «   Each 
input  line  is  either  stimulated  or  non-stimulated  and  only 
in  the  former  case  they  Instantaneously  transmit  a  signal 
to  the  R-unlt  through  the  appropriate  connection.   However, 
these  input  signals  need  not  be  unit-signals  but  have  the 
value  (magnitude)   v   carried  by  the  corresponding  input 
line.   The  R-unit  will  be  stimulated  if  the  sum  of  the 
values  of  the  stimulated   "e"   input  lines  minus  the  sum 
of  the  values  of  the  stimulated   "1"   input  lines  is   >  0 
Otherwise  the  R-unlt  is  non-stimulated.   That  is  if  ^  (t)  , 
a  =  a, ,  Up,,  ...,  a   are  the  values  on  the   q  input  lines 

Q 

to  a  given  R-unit,  then  it  is  stimulated  if 


q^^l    q'     q' 
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where   y    =  +1  or  •=!   accordiner  as  the  a  »  -th  Input  line 
Is  of  connection  type   "e"   or   "1"  ,  respectively.   If 
an  R-unlt  is  stimulated  at  time   t   it  transmits  a  unit 
signal  on  the  output  line  at  time   t+h(t)  ,   The  time  lag, 
h(t)  J,  is  to  be  a  function  of  the  values,   v  (t)  ,  of 

those  input  lines  which  are  stimulated  at  time   t  .   This 

9 
time  lag  will  be  dispensed  with  later. 

If  there  are   k   such  R-units  in  a  given  automaton 
then,  as  with  the  S-units  and  A-units,  the  set   R  of 
Defn»  2<,2  represents  all  possible  states  of  stimulation 
of  the  R-units o   Any  binary  vector   r  e  R  is  a  possible 
"state"  of  all  the  R-units  at  any  instant   t   (and 
similarly  represents  a  possible  state  of  the  output  lines 
of  all  R-units  at  any  instant). 

The  General  Percept ron  "Nerve-net";   Rules  for  the  com- 
bination of  the  three  basic  units  and  the  application  of 
Inputs  (or  stimulation  and  non-stimulation)  to  the  S-units 
determine  a  general  nerve-net^  automaton  or  perception-like 
device  9   These  rules,  insofar  as  we  can  determine  them  from 
[I4.],  are  as  follows  (with  some  possibly  trivial  but 
necessary  modifications  and  additions  of  our  own) : 

(I)  The  output  line  from  any  basic  unit  can  be  divided 
Into  any  finite  niamber  of  branches  each  transmitting  the 
identical  signal  initiated  by  that  unit, 

(II)  The  output  branches  from  an  S-unit  must  be  connected 
to  the   "e"   or   "i"   inputs  of  an  A=unit»  At  most  one 
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such  branch  from  any  S-unlt  can  be  connected  to  any  A-unito 

Every  S-unit  must  be  connected  to  at  least  one  A-unit, 

(lii)   The  output  branches  from  an  A-unit  must  be  connected 

to  the  inputs  of  an  R-unit,  with  at  most  one  such  branch 

from  any  A-unlt  to  any  R-unitj  and  at  least  one  such 

connection  from  each  A-unit, 

(iv)   The  output  branches  from  an  R-unit  may  be  connected 

to  the   "c"   input  of  an  A-unit, 

(v)   Signals  to  the  S-iinits  are  to  be  applied  at  successive 

instants  of  time,   t  „t  +5„»t^+25  ,  »»«,  t   for  finite 

o   o   s   o    s  _____ 

sequences  of  intervals. 

The  above  miles  and  the  previous  definitions  indicate 
that  there  are  still  some  missing  rules  or  information. 
In  particular  the  possible  time  delays   5  ,  6.,   and  h(t) 
should  probably  all  be  integral  multiples  of  some  umit 
interval  for  any  reasonably  functioning  and  understandable 
non-analogue  device.   Furthermore,  depending  upon  the  type 
of  behaviour  desired  with  respect  to  the  past  history  (i.e. 
"static"  or  "dynamic"  memory  effects)   5    should  be  greater 
than  5.+h(t)   or  not.   In  the  former  case  the  entire 
memory  effect  resides  in  the  values,   v.(t)  ,  while  in 
the  litter  case  more  complicated  effects  are  possibleo 
It  would  seem  for  most  considerations  in  [l\.]    that  the 
assvunption 


5   >  5   +  max  h(t)  ik-'k) 

s     A     ^ 
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is  all  that  is  required,  and  so  we  shall  adopt  it,   (In 
[2]  and  [3]  a  more  complicated  case  is  considered.) 

An  important  limitation  imposed  by  the  rules  (ii)-(iv) 
is  that  such  an  automaton  is  incapable  of  counting  or, 
what  is  essentially  equivalent,  there  can  be  no  "closed" 
active  loops  which  transmit  a  signal  periodically,  say 
with  fixed  period  6.  »   These  properties  would  become 
possible  if  outputs  of  A-units  or  R-unlts  were  permitted 
to  be   "e"   and   "i"   Inputs  of  A-units.   (The  "logical 
depth",  in  the  sense  of  [2]  and  [3]j>  of  the  current  automata 
are  then  rather  restricted,)   In  this  regard  there  seems 
to  be  some  confusion  in  [I4.]  where  the  verbal  rules  for 
connections  between  basic  units,  pp,  25-27  etoseq,,  con- 
tradict various  diagrams.  Figs,  1,  2b,  3»  et.seq.   However, 
the  extra  connections  allowed  in  the  diagrams,  from 
R-units  to  R-units  or  A-units  and  from  A-units  to  A-units, 
are  all  "inhibitory"  and  thus  would  still  not  yield  the 
desired  additional  features  mentioned  above. 

There  are  further  restrictions  imposed  on  the  nets 
considered  in  [l\.].      These  will  be  dismissed  later. 

Using  the  notions  of  binary  vector,  etc.,  we  may  now 
formulat*^  a  (mathematical)  model  of  the  proposed  automaton. 
The  "input"  to  the  automaton  at  any  instant  t  is  repre- 
sented by  a  vector  s(t)  e  S  ,  The  total  resultant  input 
to  the  a-th  A-unit  at  time  t  can  then  be  represented  by 
the  inner  product 
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(w„  ,  s(t)) 

where  w   is  an  n-component  row  vector  whose  components 
are   0,  +1   according  to  the  rule: 

w  =(w   T,w   o*«»«»w    ) 
-^a  —   a,  1    a, 2  *     '   a,n 

0  if  no  output  branch  from  the  p-th  input 

goes  to  the  a-th  A-unit; 

w   „  =  {  +1   if  an  output  branch  from  the  p-th  input     ik'B)    (a) 
^»  P   1 

goes  to  an  "e"  input  of  the  a-th  A-;mlt; 
-1   if  an  output  branch  from  the  p-th  input 

goes  to  an  "i"  input  of  the  a-th  A-unito 
Then  forming  the  j-rowed  by  n-columned  rectangular  matrix 


W  =  (w^^p)  =  /   _Wo    1  ,  (i;o5)  (b) 


the  inputs  to  all  A-units  at  time   t   are  given  by 

W  s(t)   . 

To  denote  the  state  of  stimulation  of  the  A-unlts  at 
time   t  we  introduce  a  threshold  function  which  maps  any 
real  finite  dimensional  vector  into  a  binary  vector  of 
the  same  dimension.   Thus  if  x   is  a  p-dimensional  real 
vector 
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0  ,   If  x^  <  0 
T^(x)  =  y  ,   y  =  ^  ,  a  =  1,2,  ...,  p.  (I|..6) 

1  ,   If  x^  >  0 

Then  the  state  of  stimulation  of  all  the  A-unlts  at  time   t 
is  given  by: 

^(t)  =  TQ(W^(t))  .         ikol) 
Here,  or  course,   a(t)  e  A   and  (1|.7)  represents  a  single 
valued  function  on  S   to  A  . 

Assiomlng,  for  the  present,  that  the  values  of  all 
A-units  at  time   t  ,  the  stimulated   "c"   input  lines  to 
all  A-unlts  at  time   t  ,  and  the  rule  for  determining  the 
values  at   t+S.   are  known  we  form,  using  (i|,l),  (i4-o2)  and 


(i^o7)j 


V(t+5^)  TQ(W^(t))  .  (i;.8) 


This   a   dimensional  vector  represents  the  state  of  the 
output  lines  of  all  A-=units  at  time   t+6.  ,   Now  the  inputs 
to  all  R-units  at  this  time  can  be  expressed  by  introducing 
the  matrix  Y  ,  analogous  to  W  s 

Y  ;  (y^^p)  (k.9) 

0  if  no  output  branch  from  the  P-   A=unit 
goes  to  the  a   R-unit; 

+1  if  an  output  branch  from  the  p   A-unit 
_  goes  to  an  "e"  input  of  the  a 

°-'^  ~  I  R-unit; 


-1  if  an  output  branch  from  the  p   A-unit 

goes  to  an  "i"  input  of  the  a    R-unit  ^ 
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Using  (i+oS),  (i|.,9)  and  the  threshold  function  we  finally 
define 

^(t+6^)  =  Tg,  [y  V  (t+6^)  Tg[W£(t)]j    .     (l4.,10) 

This  is  a  k-dlmensional  binary  vector  which  represents 
the  state  with  regard  to  stimulation  of  all  the  R-units 
at  time   ^+5.  ,   Thus  assuming  we  know  how  to  evaluate 
V(t+5.)  J  the  explicit  expression  (i|olO)  determines  which 
of  the  R-units  are  about  to  transmit  signals  and  which 
noto   Furthermore  the  above  expression  represents  a  single 
valued  function  on   S   to   R   and  thus  must  be  some   f  e  F  , 
Which  particular  function  it  represents  at  any  instant 
depends  on  the  specification  of  the  connections   W^Y  and 
some  as  yet  unrepresented  ones,  (the   c   inputs),  the 
thresholds   9  and   0   ,  the  rules  for  computing  values 
and  perhaps  time  lags   h(t)  ,  the  past  history  of  inputs 
s(t  )  ,  s(t  +6  )o  ,o6,  s(t-5  )   and  initial  state  of  the 

^»   O'^'O   s  ^-     s    

values,   V(t  )  »   In  spite  of  this  seeming  complexity  if 

some  input  signal   s  e  S  at  time   t   is  to  be  mapped 

into  some  response  signal   r  e  R   at  any  time   t+5.+h(t+5.) 

for  any  past  history,  by  the  above  type  of  automaton  then 

t 
there  must  exist  (constant)  quantities,   Y, V, W, 9  and  9 

such  that 

r  =   Tg,     [y  V   TqLW^j]  (lioll) 

Furthermore    if   for   some      t+6rt+h(t+5.)    ,    as    is    implied   by 
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the  so-called  "learning  experiments"  In  [i^-J,  the  automaton 
Is  to  be  able  to  give  the  same  response,  say  ^-^   ,    to 
any  s  e  S,  CS  ,  then  (U..11)  must  represent  a  discrimination 
function  of   I-,  =  (s^j  with  respect  to  0^   "{i"!]  *   -""^ 
fact  discrimination  with  respect  to  two  sets  is  usually 
discussed  in  [i;]  and  the  implications  are  that  an  arbitrary 
finite  number  could  be  used.   Thus  if  this  is  ever  to  be 
possible  it  must  be  proven  that  there  exist  discrimination 
functions  of  the  form  (L|.,ll)«   This  is  trivial  and  is  done 
in  the  next  section.   The  more  difficult  and  interesting 
task  however  would  be  to  prove  that  some  particular  dis- 
-crimination  function  can  have  this  form.     Once  such 
results  were  obtained  it  would  become  reasonable  to  in- 
vestigate the  seemingly  still  more  difficult  problem  of 
the  existence  of  "learning  sequences",   3(t^)  ,  s(t  +5_), 
,,,,  s(t-6  )   which  would  produce  the  desired  discrimination 

functiono 

Some  of  the  properties  of  the  operation  of  the  auto-= 
maton  have  been  neglected  in  the  above  discussion^  in 
particular  the  delayed  response  of  the  R-unitSo   However, 
it  is  clear  that  such  considerations  do  not  add  any  new 
"degrees  of  freedom"  to  the  representation  (I|.olO)  since 
only  those  R-unlts  stimulated  at  time   t+6^  may  transmit 
signals  later.   Also  the  specific  nature  of  the  value 
change  cannot  alter  the  discussion  concerning  (Li.<,ll)o 
These  features  are  no  doubt  related  to  t  he  problems  con- 
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cernlng  "learning  sequences"  since  they  embody  all  the 
memory  aspects  of  the  device. 

In  [i\.]    there  are  many  additional  r  estrictlons  placed 
on  the  net,  loe,  "randomness"  of  various  connecting  lines, 
equality  of  all  thresholds,  same  number  of   "e"   and   "i" 
inputs  to  each  A-unit,  and  perhaps  more.   But  again  these 
conditions  can  only  restrict  the  generality  formulated 
above  and  possibly  simplify  some  analysis.   Hence  we 
shall  not  bother  with  these  details  in  the  present  discussion, 
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5o   Some  Necessary  Conditions  for  Discrimination  by 
Perceptron-Like  Automata. 

It  was  shown  in  the  previous  section  that  in  order 
for  a  perceptron-like  automaton  to  represent  a  discrimination 
function,  a  function  of  the  form 

^  -  Tg,  [y   V  Tg  [W^]J  =  b(^)  (5.0) 

must  also  be  able  to  represent  a  discrimination  function. 
In  this  section  we  consider  only  functions  of  this  form  , 
(5oO),  where,  to  summarize; 

(a)   s  e  S  ,  r  e  R  I 


(W^-  (Vp)  '  Vp  =  0,  11 


Y  =  (y^^^)  ,  y^^^  =  0,  +  1    (    I   a=i,2,...,  j   (5.1) 


V  =  (v  6„   )  ,  V  =  arb.  real 
—   a  S,  a     a 

^'  nos. 

(c)  e,  o'  >  0  ,  Tq(x)  =2  ,  y^  = 


P=l,  2,  . .  .,  n 
a=l,2,...,  j 
Y=l,2,  ...,  k 


0  ,   if  x^  <  9 


1  »   i^  ^v  > 


Thus  it  is  easily  shown  that  (5.0  -  1)  defines  a  single 
valued  function  on   3   to   R  .   Furthermore  this  function 
may  be  considered  a  "product"  of  two  functions 

8(4)  =  TqLW  s] 

b(_s)  =  h(g(s))  ,         \  (5o2) 

h(a)  =  T„,[Y  V  a]  . 
The  function  g   is  single  valued  on   S   to  A   and  the 
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function  h  is  single  valued  on  A   to  Ro 
Definition  5olo   B  =  /the  set  of  all  function  b(s) 
of  the  form  (5,0  -  o2)  on  S   to  Rj  » 

The  set  B   Is  thus  obtained  by  considering  all  possible 
"connections "j,   W   (from  S-unlts  to  A-unlts)  and  Y   (from 
A-unlts  to  R=unlts)j,  all  possible  values,   V  ,  of  A-unlts, 
and  all  possible  positive  thresholds   0  and   0   „   Thus 
every  possible  "state"  of  a  perceptron  at  any  Instant  can 
be  represented  by  some  function  b  e  B  » 

From  the  posltlvlty  assumption,  (Sole),  on  the 
thresholds  it  follows  that  for  all   b  e  B  s 


b(0s)  =0^  (5o3) 


where  the  null  vectors,  0^   ,  are  to  belong  to  the  appropriate 

sets   S   and  R  ,   Thus  we  have 

Tro  5»lo   B   is  a  proper  subset  of  P  o 

Proof.   That   B  £  F  follows  from  the  definition  (5oO  -  d); 

i.e,  each  b  e  B   is  a  single  valued  function  on  S   to   R 

and  hence   b  e  F  ,   B  f^  F  follows  from  (5.3)  and  that   B 

is  not  empty  is  obvious „ 

This  result,  though  trivial,  clearly  Indicates  that 

perce;7tron-llke  automata  cannot  represent  all  functions 

on  S   to  R  o   Another  trivial  though  unfortunately 

relatively  useless  result  is  contained  In 

Tr»    5<>2»      For  every     b    e    B      there    exist   an   integer     m  <   2 

and   sets      I        and     0        such  that      b   e   Fp,(I   ,0   )    .      (In  fact 
— —        m     • — —        m     — D^    m*    m' 
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the  same  is  true  for  all   f  e  F  ,)   The  proof  is  obvious, 

Howeverj  this  result  might  give  some  hope  for  finding 

"useful"  discrimination  functions  in  B  ,   We  turn  to  this 

question  now^ 

Definition  5»2o   Let   g(s)   and  h(a)   be  any  two  functions 

of  the  form  (5o2)o   Then  A  (S  )  =  Tall  a  e  A   such  that 

a  =  g(s)   for  some   s  e  S  C  s3  ♦ 

R,  (A  )  =  i all  ^  e  R   such  that   r  =  h(a)   for  some 

a  e  a'  c:  a"^  o   (That  is,   A  (s' )   is  the  image  in  A  of 
<^  —    ~J  6 

S   and  Rv^(a' )   is  the  image  in  R   of  A   ,) 

From  the  above  definition  it  is  clear  that,  since   g 
and  h  are  single  valued, 

\   (Ag(s'))  <'>^(S')  <'>)(S)  =  2^ 

7^(r^(a'))  <'^(a')  <^(a)  =  a-J  „ 

Then  for  any  given  discrimination  problem,  characterized 

by  s ome  set   Pt^(I  ,0  )  -  we  have 
"^  D  m  m'  * 

Tro  5°3o  A  necessary  condition  that  F_j(I  ,0  )n  B  /  j2^ 
(ioeo  that  B  contain  a  discrimination  function  of  I 
Worot,   0^  )  lj_  2"^  >  m  , 

Proof o   From  {^^2.)    and  {S<h)    it  follows  that  '^(Rj^(A))  <  2^ 
for  all  h  and  hence  for  all  b  e  B  .   But  '^(0  )  =  m 
and  the  proof  is  complete. 

This  result  indicates  that  there  must  be  at  least 
(logp  m)  A=units  in  any  perceptron  which  is  to  discriminate 
between  m   sets  of  stimuli^   (It  is  suggested  in  [iiJ  that 
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many  A-units  should  be  used  but,  since  m  =  1  or  2   in  most 

discussions  there,  the  required  condition  of  Tr,  5»3  Is 

easily  satisfied  with  relatively  few  A-units,   However, 

see  the  discussion  after  Tr,  5«5o) 

Tr,  Sok»      A  necessary  condition  that   F^(I  0  )n  B  ^  0 
ei — =L   "  u  m  m 

is  that  either  ^„  /  U   or  else   f(i^Q)  ="0-0     for  some 

f  e  Fp  , 

Proof,   Obvious  from  (5.3) • 

This  condition  seems  to  be  completely  disregarded  or 

unsuspected  in  [I].]  „      On  the  other  hand  kleene  [2]  carefully 

distinguishes  so-called  "positive  definite  events"  which 

essentially  require  0„    tl  XJ      ,   Thus  a  result  of  the  above 

form  is  relevant  to  various  kinds  of  automata,  not  only 

those  described  by  the  set  B  , 

Tr,  5,5.   A  necessary  condition  that  F_.(I  ,  0  )  A  B  7^  G^ 
si — =i   li D  m  m      '  ^ 

is  that  for  some  function  g  ,  defined  in  {S»2.)    , 

A    (S  0  A  (S  )  =  jZf  for  all   ti,  v   such  that  y,  7^  v    o         {5»5) 

Proof,   Since  every  b  e  B   is  of  the  form  b  =  h(g)  , 

by  (5o2),  for  any  b  e  F  (I  ,0  )   there  must  exist   h  and 

g  such  that: 


R,  (A  (S  ))  =  r  ,     u  =  1,2,  ,0.,  m  , 

Hovf«,ver,  the  functions   h  are  single  valued  and  so  the 

sets   A  (S  )   must  be  pairwise  disloint  (we  recall  that  the 
g   H,'  ^  *J 

elements   r   are  distinct).   This  concludes  the  proof. 
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There  is  some  discussion  on  page  l\.l   of  [I4.]  which  may- 
have  relevance  to  the  above  condition.   In  fact  the  state- 
ment "No  restraints  are  placed  on  S(-unit)  connections  ,.0" 
would  seem  to  directly  violate  the  implications  of  Tr,  5o5» 
If  we  assiome  "random"  connections  from  S-units  to  A-unitSj, 
subject  to  the  restrictions  on  page  I4.I  of  [1+],  it  should 
not  be  too  difficult  to  calculate  the  probability  of 
violating  (5o5)o   It  is  not  clear  that  this  probability 
can  be  made  negligibly  small  for  "practical"  parameter 
values,  but  againj,  requiring  a  "large"  number  of  A-units, 
as  is  suggested  in  [l\-],    is  a  step  in  the  right  direction. 
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6 a   Approximate  Discrimination,, 

In  mechanisms  of  the  complexity  Implied  by  the  usual 

concepts  of  automata  It  Is  perhaps  unreasonable  to  require 

that  a  proposed  automation  exactly  represent  a  given 

discrimination  class,   Fp,(I  oO  )  o   The  most  obvious  reason 

u  m  m 

is  the  possible  malfunctioning  of  the  basic  units,  of  which 
there  are  assumed  to  be  very  many  [^   10    for  human 
systems) o   This  type  of  difficulty  has  been  Investigated  by 
von  Neumann  [3]  and,  in  principle,  he  has  shown  that 
"reliable"  systems  (with  an  arbitrarily  small  probability 
of  error  in  strong  discrimination)  can  be  constructed  for 
a  particular  type  of  nerve-net  and  discrimination  problemo 
However,  it  seems  reasonable  not  to  expect  exact  discrimination 
on  another  (not  unrelated)  basis »   This  is,  roiighly,  that 
in  analogy  with  hioman  systems  two  stimuli  in  the  same  class, 
say  S   ,  could  have  responses  that  are  "close"  to  each 
other  but  not  necessarily  identical  (as  in  Section  3  where 
generalized  discrimination  was  used  to  avoid  this  difficultyo) 
These  notions  can  be  made  precise  by  Introducing  some 
measure  of  the  "distance"  of  any  function   f  e  F   from  the 

Definition  60I0   For  any  discrimination  class   F„(I  ,0  ) 
— - — — • ''  JJ  m*^  m 

we  assign  a  real  number,   l|f|lo  j>  to  each  f  e  F  byj 
1    rn         (f(s)-r  ,f(s)=r  ) 

ii^'IId  =  s  5~  y~      ~  ~^'  -  -^   .       (6.0) 
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Clearly   |f  H^^  =  0   if  and  only  if  t   e   F^   ,      The 
maximum  value  of   (f(s)-r  ,  fCsj-r  )   is   k  ^  which 
occurs  if  and  only  if  each  component  is   +1  „   This  implies 
that   f(s)   is  the  "complement"  of  r   with  respect  to 
the  component  values   0,1  ;  or  in  obvious  terminology 
"f (s)   is  as  different  from  r   as  possible".   If  for  each 
s  <^  U      ,  f  (s)   is  as  different  from  the  correspondina;   r 


s  possible  then   |!f |L  =  1  o   These  results  are  summarized 


a 
in 


0  <  ||f  Iq  <  1  . 

Ilflljj  =   0   ioaoO,io      f   e   Fp   o      If      Hff^j  =   1      then     f(s)    =   1-^ 

T 
for  all      seS      ^^.^l,  2so.«,    m^    whe  re      1  =    (1,1,    o.o,    1) 

To  see  how  this  measure  of  distance  may  be  used  in 

requiring  close  approximations j,  or  to  see  just  how  small 

values  of   I|f |L  must  be  for  some  desired  degree  of 

approximation  we  note 

Tro  6o2,   (a)   If 

llf  He  <  I 

then  f(\.  .1  =  r   for  at  least  one   jj,  =  1,2,  ...,  m  and  at 
least  one   s  e  S   » 


(b)      If 


|f  Ij     <   [__i _]    =   e 

^  komoraax?t(S    )  ° 

1^  ^ 


(il-1) 


then      ||f  lljj  =   0     and      f   e   F^^      , 

Proof.   For  part  (a)  consider  a  function  f  which  "misses" 

being  a  discrimination  function  by  just  one  "bit"  for 

every  ^  e  U   .   For  part  (b)  consider  an  f  which 

"misses"  by  Just  one  bit  for  only  one   s  e  U   «   The 

-"^    m 

proof  then  becomes  clearo 

The  kind  of  probabilistic  problems  which  should  now 

be  Investigated  are  to  find  conditions  such  that:  the 

probability  that   |[f  ||j^  <  e  s  can  be  made  arbitrarily  small. 

This  includes  and  generalizes  the  problems  treated  in  [3] 

(where   e  <  e  )   and  the  hope  is  that  more  practical 

automata  can  be  found  for  reasonable  values   e  >  e   « 

o 
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7,   Finite  Discrete  Time  Sequences. 

In  most  automata  It  Is  assumed  that  a  temporal 
sequence  of  signals  Is  to  be  applied  and  that  a  corresponding 
sequence  of  responses  then  follows »   If  we  assume  that  the 
functioning  of  the  device  and  the  Input  of  signals  can 
be  adequately  described  by  considering  only  finite  discrete 
sets  of  Instants  of  time  then  our  previous  model  is  easily 
extended  to  include  such  situations. 

So  let  us  assume  that  the  only  Instants  at  which  the 
input  signals  and  state  of  the  automaton  (Including  the 
output  signals)  need  be  specified  is  the  set 


T  -  l^^i»^2*    •••»  ^ij 


(7.0) 


Of  course  we  assiime   t   ,  >  t,^  and,  although  not  re- 
quired here.  It  is  also  convenient  to  assume   t^  j=  t^+^^t,«- 
wC>.lj2,  oo.,  i-1  .   The  sets   S  and  R   are  those  of 
Definition  2,2  and  the  automaton  has   n  input  lines  and 
k  output  lines. 

The  input  for  the  entire  set  T  now  consists  of 
a  set  of   1   binary  vectors   s  e  S  ,   We  may  write  any 

such  total  input  set  as  a  binary  vector  of  dimension  nl,  say 

''^(t^) 

^(t^) 
1  =  '"  •     I  (7.1) 


^(^) 


(k3) 


Similarly  the  response  for  the  set   T   is  a  binary  vector 
of  dimension  ki  ,  say 


r(t  ) 


(7,2) 


X^t^) 


The  procedure  is  now  clears  we  define  the  sets 

N  =  ni 


dimensional  binary  vectors; 


Similarly  corresponding  set  functions  and  discrimination 
problems  are  defined  in  exact  analogy  to  those  of  Section  2, 
The  results  of  that  section  then  apply  with  only  the 
appropriate  change  in  notation. 

In  order  to  use  the  present  generalized  model  to 
analize  an  automaton  we  would  have  to  know  precisely  the 
time  delays  in  all  of  the  units  of  that  automaton.   Thus 
at  the  present  we  cannot  apply  it  to  perceptronSo   However, 
the  automata  described  in  [2]  and  [3 J,  when  restricted  to 
finite  bounded  time  sequences   T  ,  are  easily  represented 
as  fxinctions  on  S   to  I?  « 
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FOOTNOTES 

1,   A  single  subset,   S   ,  is  called  a  definite  event  by 

Kleene,.   Thus  we  consider  here  the  more  general  problem 
of  distinguishing  between  m  events,,  but  in  the  more 
restrictive  sense  of  neglecting  time  delays, 

2o   A  useful  generalization  of  these  notions  is  obtained  by 

replacing  the  r   by  disjoint  sets  R  C  R,   Then  with 

the  introduction  of  V_  =  R-C^R^  -> » =  Ui<^   ,  results 

ra  — "  1   2       nr 

analogous  to  all  of  the  following  are  easily  stated  and 
and  proved B   The  R   are  to  be  considered  as  sets  of 
equivalent  responses.   With  this  generalization  any 
strong  discrimination  problem  of  order  m  can  be  shown 
to  be  equivalent  to  an  ordinary  discrimination  problem 
of  order  at  most  m+l« 

3»      For  fixed     n     it   is    quite   clear  that    there   exists   a     k  ^ 

sufficiently  large^  such  that  discrimination  functions 

pn 

exist  for  all  I   and  0   with  m  <  2   .      The 

mm         -= 

arithmetic  of  this  construction  and  some  rather  fanci- 
ful implications  of  it  are  contained  in  [$]«, 

1^.0   Notice  that  patterns  may  only  be  rotated  by  integral 
multiples  of  ti/Z   ,   Translations  may  be  allowed  to 
carry  a  pattern  across  the  boundary  of  the  unit  square 
if  we  require  the  same  translation  to  be  applied  to 
replicas  of  the  pattern  in  all  neighboring  unit  squares- 
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Thus  the  part  lost  say  in  x  >  1  ,  is  returned  from 
X  <  0  « 

5o  In  [i;]  the  input  signals  themselves  are  said  to  be 

either  "e"  or  "i"  o   This  seems  t o  be  an  unnecessary 
complication  and  is  at  variance  with  the  types  of  signals 
transmitted  in  digital  devices, 

60   This  observation  indicates  that  the  graphs  in  FigSo 

Sa,  5^9  11  (and  perhaps  others)  of  [ij.J  cannot  be  correct^* 
they  must  at  least  be  step  fvinctionss 

7«,   The  notion  of  value  distinguish  ."  these  A=units  from  the 
special  neurons  of  [1]  and  [3].   ..,...:,vers  if  v  can 
have  only  a  finite  set  of  values ;,  say  <  2^  of  themj, 
the  A=units  could  be  cons  time  ted  of   q.  much  simple  r^ 
"single -output"  units  (see  [2]  and  [3]),= 

8,   It  is  implied  in  this  expression  that   v  (t)  =  0  if 

the  a^th  input  line  is  not  stimulated o   This  is  clarified 
later  in  the  complete  model  of  the  total  nerve-net o 

9»   It  should  be  mentioned  that  neurons  with  variable  discrete 
time  lags  J,  depending  upon  the  inputs,,  are  briefly 
suggested  in  [2]  and  [3K   It  would  seem  that  such  units 
can  also  be  composed  of  the  simpler  basic  units  discussed 
in  these  papers, 

lOe   Such  problems  are  considered^  for  different  automata^,  in 
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[2]  and  [3], 

11  <v  All  the  results  of  this  section  are  much  more  general 
and  apply  for  arbitrary  real  matrices  W,  Y  and  V  of 
the  indicated  orders » 
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Figure  I:    Schematic    diagram  of  an  A-unit. 
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Figure  2:    Schematic  diagram  of  a  R-unit. 
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